Comments for MEDB 5502, Week 06

Topics to be covered

  • What you will learn
    • Test of two proportions
    • Chi-square test of independence
    • Odds ratio versus relative risk
    • Concepts behind the logistic regression model
    • Logistic regression with categorical variables
    • Logistic regression with interactions
    • Risk adjustment
    • Diagnostics

Comparing two binary outcomes

  • Is there a difference in the proportion of deaths between male passengers and female passengers on the Titanic?
  • Is there difference in the proportion of patients finishing the full three doses of HPV vaccine between Black women and White women?
  • Does using a ng tube for feeding in pre-term infants increase the probability of successful breast feeding at six months?

Other comparisons involving a binary outcome

  • Is there are difference in the proportion of deaths between first class, second class, and third class passengers?
  • Does age influence the proportion of women finishing the full three doses of HPV vaccine?
  • Controlling for the mother’s age, does using a ng tube for feeding in pre-term infants increase the probability of successful breast feeding at six months?

Hypothesis framework

  • \(H_0:\ \pi_1=\pi_2\)
  • \(H_1:\ \pi_1=\pi_2\)
  • Compute \(\hat p_1\) and \(\hat p_2\) from samples
  • Accept \(H_0\) if \(\hat p_1-\hat p_2\) is close to zero.
    • \(T=(\hat p_1-\hat p_2)/s.e.\)
    • 95% CI: \((\hat p_1-\hat p_2) \pm Z_{\alpha/2}s.e.\)

Titanic data, 1 of 3

data_dictionary: titanic.txt
description: Mortality among passengers of the Titanic

Titanic data, 2 of 3

Name:
  label: Passenger name
PClass:
  label: Passenger class
  scale: ordinal
  values: 1st, 2nd, 3rd
Age:
  unit: years
  scale: positive discrete
  missing: NA

Titanic data, 3 of 3

Sex:
  scale: binary
  values: female, male
Survived:
  scale: binary
  values:
    '1': yes
    '0': no

Data layout, 1 of 2

Data layout, 2 of 2

Confidence interval and test of hypothesis

Live demo, Test of two proportions

Break #1

  • What you have learned
    • Test of two proportions
  • What’s coming next
    • Chi-square test of independence

Chi-square test of independence, 1 of 2

  • Equivalent to test of two proportions
  • Lay out data in two by two table
\[\begin{matrix} & No\ event & Event \\ Treatment & O_{11} & O_{12}\\ Control & O_{21} & O_{22} \end{matrix}\]

Chi-square test of independence, 2 of 2

\[\begin{matrix} & No\ event & Event \\ Treatment & E_{11} = n_1 (1-\hat p_.) & E_{12}=n_1 \hat p_.\\ Control & E_{21} = n_2 (1-\hat p_.) & E_{22}=n_2 \hat p_. \end{matrix}\]
  • \(X^2=\Sigma \frac{(O_{ij}-E_{ij})^2}{E_{ij}}\)

Example: Titanic survival by sex

  • Moderate or large sample size: Pearson Chi-Square
  • Small sample size: Fisher’s Exact test

Live demo, Chi-square test of independence

Break #2

  • What you have learned
    • Chi-square test of independence
  • What’s coming next
    • Odds ratio versus relative risk

Titanic data

       Survived   Died  Total
Female   308      154     462
Male     142      709     851
Total    450      863   1,313

Titanic data, odds of death

       Survived   Died  Total  Odds
Female   308      154     462  2     to 1 against
Male     142      709     851  4.993 to 1 in favor
Total    450      863   1,313

Odds ratio = 4.993 / 0.5 = 9.986

Titanic data, probability of death

       Survived   Died  Total  Probability
Female   308      154     462    0.3333
Male     142      709     851    0.8331
Total    450      863   1,313

Relative risk = 0.8331 / 0.3333 = 2.5

Which is better

  • Relative risk is consistent with how most people think, but
    • Relative risk cannot always be computed
    • Relative risk has an ambiguity

Fractions are funny

  ----------  ----------
  0.8  (4/5)  1.25 (5/4)  
  0.75 (3/4)  1.33 (4/3)  
  0.67 (2/3)  1.50 (3/2)  
  0.50 (1/2)  2.00 (2/1)  
  ----------  ----------

Swapping the numerator and denominator

  • Odds ratio = male odds / female odds
    • = 4.993 / 0.5 = 9.986
  • Odds ratio = female odds / male odds
    • = 0.5 / 4.993 = 0.1001
  • Relative risk = male probability / female probability
    • = 0.8331 / 0.3333 = 2.4996
  • Relative risk = female probability / male probability
    • = 0.3333 / 0.8331 = 0.4001

Interpretability, 1 of 3

  • Change from 25% probability to 50% probability
  • Change from 3 to 1 odds against to even odds
    • RR = 2, OR = 3

Interpretability, 2 of 3

  • Change from 25% probability to 75% probability
  • Change from 3 to 1 odds against to 3 to 1 odds in favor
    • RR = 3, OR = 9

Interpretability, 3 of 3

  • Change from 10% probability to 90% probability
  • Change from 9 to 1 odds against to 9 to 1 odds in favor
    • RR = 9, OR = 81

Designs that rule out the use of the relative risk, 1 of 2

         Cancer cases  Controls  Total  
Balding       72          82      154  
 Hairy        55          57      112  
 Total       129         139      268  

Designs that rule out the use of the relative risk, 2 of 2

         Heart disease     Healthy     Total  
Balding    127 (9.4%)   1,224 (90.6%)  1,351  
 Hairy     548 (6.7%)   7,611 (93.3%)  8,159  
 Total     675          8,835          9,510  

Covariate adjustments

          Children   No children  Total  
Epilepsy  232 (40%)   354 (60%)    586  
Control    79 (72%)    30 (28%)    109  
Total     311         384          695  

Ambiguous and confusing situations

  • One hundred pound sack of potatoes
    • 99% water, 1% potato
    • Weighs 1 pound after completely drying
    • Instead dry until 2% potato
      • How much does it weigh then?

Example: physician recommendations

                 No cath      Cath      Total  
 Male patient   34  (9.4%)  326 (90.6%)  360  
Female patient  55 (15.3%)  305 (84.7%)  360  
         Total  89          631          720  

Example: Breast feeding study

           Continued bf  Stopped bf  Total  
Treatment   19 (37.3%)   32 (62.7%)    51  
 Control     5  (8.8%)   52 (91.2%)    57  
  Total     24           84           108  

Break #3

  • What you have learned
    • Odds ratio versus relative risk
  • What’s coming next
    • Concepts behind the logistic regression model

What is logistic regression?

  • Binary outcome
  • Categorical or continuous predictors
  • Linear on the log odds scale

Why log odds?

  • Statistical model of surgery
    • Estimates probability of demise
    • First prediction: probability=1.2
  • Log odds prevent out of range predictions

A linear model for probability, 1 of 2

A linear model of probability, 2 of 2

A multiplicative model for probability

The relationship between odds and probability

  • odds = prob / (1-prob)
  • prob = odds / (1+odds)
    • \(0 \le\) prob \(\le 1\)
    • \(0 \le\) odds \(\le \infty\)
      • \(0 \le\) odds against \(\le 1\)
      • \(1 \le\) odds in favor \(\le \infty\)

A log odds model for probability, 1 of 4

A log odds model for probability, 2 of 4

A log odds model for probability, 3 of 4

A log odds model for probability, 4 of 4

An example of a log odds model with real data, 1 of 3

An example of a log odds model with real data, 2 of 3

  • log odds = -16.72 + 0.577 \(\times\) GA

An example of a log odds model with real data, 3 of 3

  • log odds = -16.72 + 0.577 \(\times\) 30 = 0.59
  • odds = exp(log odds) = 1.8
  • prob = odds / (1+odds) = 0.64

Live demo, Concepts behind the logistic regression model

Break #4

  • What you have learned
    • Concepts behind the logistic regression model
  • What’s coming next
    • Logistic regression with categorical variables

Categorical variables in a logistic regression model, 1 of 3

  • 1st class odds: 129/193 = 0.67 or 193/129 = 1.5
  • 2nd class odds: 161/119 = 1.35 or 119/161 = 0.74
  • 3rd class odds: 573/138 = 4.15 or 138/573 = 0.24

Categorical variables in a logistic regression model, 2 of 3

  • 1.50 / 0.24 = 6.212
  • 0.74 / 0.24 = 3.069

Categorical variables in a logistic regression model, 3 of 3

  • 0.74 / 1.50 = 0.494
  • 0.24 / 1.50 = 0.161

Live demo, Logistic regression with categorical variables

Break #5

  • What you have learned
    • Logistic regression with categorical variables
  • What’s coming next
    • Logistic regression with interactions

Interactions in logistic regression

  • Odds ratios vary by a third factor
  • Interpretation is more tedious

Odds ratios for first class

Odds ratio for second class

Odds ratio for third class

Logistic regression with interaction

  • Odds ratio for 3rd class = 4.608
  • Odds ratio for 1st class = 4.608 \(\times\) 6.572 = 30.2
  • Odds ratio for 2nd class = 4.608 \(\times\) 9.289 = 42.8

Line plot for interaction, 1 of 2

Line plot for interaction, 2 of 2

Live demo, Logistic regression with interactions

Break #6

  • What you have learned
    • Logistic regression with interactions
  • What’s coming next
    • Risk adjustment

Description of bf data, 1 of 11

data_dictionary: bf.csv
description: |
  This data comes from a research study done at Children's Mercy Hospital and St. Luke's Medical Center. This was a study of breast feeding in  pre-term infants. Infants were randomized into either a treatment group (NG tube) or a control group (Bottle). Infants in the NG tube group  were fed in the hospital via their nasogastral tube when the mother was not available for breast feeding. Infants in the bottle group received bottles when the mothers were not available. Both groups were monitored for six months after discharge from the hospital.

Description of bf data, 2 of 11

feed_typ:
  value: Control Treatment
age_stop:
  label: Age at which infant stopped breast feeding
  scale: non-negative real
  unit: weeks
sepsis:
  label: Diagnosis of sepsis
  value: No Yes
total_ab:
  label: Total number of apnea and bradycardia incidents
  scale: non-negative integer

Description of bf data, 3 of 11

del_type:
  label: Type of delivery
  values:
    Vaginal: 1
    C-section: 2
mom_age:
  label: Mother's age
  unit: years
gravida:
  label: Gravidity or number of pregnancies
  scale: non-negative integer
para:
  label: Parity or number of live births
  scale: non-negative integer

Description of bf data, 4 of 11

mar_st:
  label: Marital status of mother
  values:
    Single: 1
    Married: 2
race:
  label: Mother's race
  values:
    White: W
    Black: B
smoker:
  label: Smoking by mother during pregnancy
  values:
    'TRUE': 1
    'FALSE': 2

Description of bf data, 5 of 11

mi_hosp:
  label: Distance from the mother's home to the hospital
  unit: miles
  scale: non-negative integer
ng_tube:
  label: Time on the NG tube
  unit: days
  scale: non-negative integer
tot_bott:
  label: Bottles of formula given while in the hospital
  scale: non-negative integer

Description of bf data, 6 of 11

bw:
  label: Birthweight
  unit: kg
  scale: non-negative real
gest_age:
  label: Estimated gestational age
  unit: weeks
  scale: positive integer
apgar1:
  label: Apgar score at one minute
  scale: 0 through 10
apgar5:
  label: Apgar score at five minutes
  scale: 0 through 10

Description of bf data, 7 of 11

bf1_wt:
  label: Weight at first breast feeding
  unit: kg
  scale: non-negative real
bf1_age:
  label: Age at first breast feeding
  unit: hours
  scale: positive integer

Description of bf data, 8 of 11

dc_wt:
  label: Weight at discharge
  unit: kg
  scale: positive real
dc_age:
  label: Age at discharge
  unit: days
  scale: positive integer

Description of bf data, 9 of 11

dc3_wt:
  label: Weight three days after discharge
  unit: days
  scale: positive real
bf0:
  label: Breastfeeding status at hospital discharge
  values:
    Exclusive: 1
    Partial: 2
    None: 4

Description of bf data, 10 of 11

bf1:
  label: Breastfeeding status three days after discharge
  values:
    Exclusive: 1
    Partial: 2
    None: 4
bf2:
  label: Breastfeeding status six weeks after discharge
  values:
    Exclusive: 1
    Partial: 2
    None: 4

Description of bf data, 11 of 11

bf3:
  label: Breastfeeding status three months after discharge
  values:
    Exclusive: 1
    Partial: 2
    None: 4
bf4:
  label: Breastfeeding status six months after discharge
  values:
    Exclusive: 1
    Partial: 2
    None: 4

Creating a binary outcome

Crosstabulation of predictor and outcome

Unadjusted odds ratio

Adjusted odds ratio

Live demo, Risk adjustment

Break #7

  • What you have learned
    • Risk adjustment
  • What’s coming next
    • Diagnostics

Informal sample size calculations, 1 of 2

  • Rule of 50
    • Need 25 to 50 events in each group
    • Based on approximate power calculation
  • Example: newborn readmissions for jaundice
    • Occurs about 2% (1/50) of the time
    • Need 25 \(\times\) 50=1,250 or 50 \(\times\) 50=2,500 in each group

Informal sample size calculations, 2 of 2

  • Rule of 15
    • Need 15 events for each independent variable
    • Smaller ratio implies poor replicability
    • Note: events, not observations

Formal power calculation

Assumptions of logistic regression

  • Independence
  • Linearity
    • On a log odds scale
  • No assumptions about normality

Computing probability estimates, male and 3rd class

  • log odds = -2.029
  • odds = exp(log odds) = 0.1315
  • prob = odds / (1+odds) = 0.1162

Computing probability estimates, male and first class

  • log odds = -2.029 + 1.319
  • odds = exp(log odds) = 0.4916
  • prob = odds / (1+odds) = 0.3296

Computing probability estimates, male and second class

  • log odds = -2.029 + 0.25
  • odds = exp(log odds) = 0.1688
  • prob = odds / (1+odds) = 0.1444

Computing probability estimates, female and third class

  • log odds = -2.029 + 1.528
  • odds = exp(log odds) = 0.6059
  • prob = odds / (1+odds) = 0.3773

Computing probability estimates, female and first class

  • log odds = -2.029 + 1.319 + 1.528 + 1.883
  • odds = exp(log odds) = 14.8946
  • prob = odds / (1+odds) = 0.9371

Computing probability estimates, female and second class

  • log odds = -2.029 + 0.25 + 1.528 + 2.229
  • odds = exp(log odds) = 7.2283
  • prob = odds / (1+odds) = 0.8785

Assessing linearity on a log scale, 1 of 3

Assessing linearity on a log scale, 2 of 3

Assessing linearity on a log scale, 3 of 3

How good are your predictions, 1 of 2

How good are your predictions, 2 of 2

Live demo, Diagnostics

Summary

  • What you have learned
    • Test of two proportions
    • Chi-square test of independence
    • Odds ratio versus relative risk
    • Concepts behind the logistic regression model
    • Logistic regression with categorical variables
    • Logistic regression with interactions
    • Risk adjustment
    • Diagnostics

Additional topics??